Skip to content

Conversation

@subhash686
Copy link
Contributor

Issue #, if available:
#1588

Description of changes:

  • Added CRaC support in powertools-common, following the guide and blogs mentioned on the ticket.
  • Added automatic priming for powertools-metrics by preloading all the required classes, the list of classes is fetched from the JVM at compile time.
  • Added manual priming for dynamodb persistence store

Checklist

Breaking change checklist

RFC issue #:

  • Migration process documented
  • Implement warnings (if it can live side by side)

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@subhash686
Copy link
Contributor Author

@phipag Looks like v2 is now merged and released, so I resolved conflicts to rebase to main. Let me know if you have any other release plans for this.

@phipag
Copy link
Contributor

phipag commented Jun 12, 2025

Thanks @subhash686 and apologize the delay. I was very busy with the v2 release.

Your PR is next on my list and we'll try to get it merged for the next minor release 2.1.0.

@phipag
Copy link
Contributor

phipag commented Jun 25, 2025

Hey @subhash686, I didn't forget you. This week is very busy but I'll do my best to test your implementation very soon.

In the meantime, can you have a look at the 4 Sonar issued and either resolve them or mark them as not applicable if you think that is the case?

@subhash686
Copy link
Contributor Author

@phipag Fixed the sonar issues, I did not notice the comment earlier, thanks for pointing it out.

@phipag
Copy link
Contributor

phipag commented Jul 7, 2025

Hey @subhash686 , I am reviewing your PR now. Can you provide some details how you generated the txt file of the classes loaded? I see you say:

the list of classes is fetched from the JVM at compile time.

Can you provide the compilation command you used for this?

Ideally, I would like to reproduce the process for how to generate and then use the txt file of loaded classes.

@subhash686
Copy link
Contributor Author

subhash686 commented Jul 7, 2025

Hey @subhash686 , I am reviewing your PR now. Can you provide some details how you generated the txt file of the classes loaded? I see you say:

the list of classes is fetched from the JVM at compile time.

Can you provide the compilation command you used for this?

Ideally, I would like to reproduce the process for how to generate and then use the txt file of loaded classes.

Thanks @phipag . Here are the steps I followed

  • Ran following commands to generate the loaded classes in the project root directory
    > export MAVEN_OPTS="-Xlog:class+load=info:classloaded.txt"
    > mvn install -pl powertools-metrics -am

  • File has each line in the format [0.054s][info][class,load] java.lang.Object source: shared objects file

  • Used regex to remove all the prefix extra characters ^\[[\[\]0-9.a-z,]+ and replace the source text using regex( source: )[0-9a-z:/._$-]+ which is at the end of each line.

  • Moved the file to powertools-metrics resources directory.

@phipag
Copy link
Contributor

phipag commented Jul 7, 2025

Thanks @subhash686 for providing this context. I will reproduce it and run a load test of this code. Will share the results here soon and we can move forward.

@phipag
Copy link
Contributor

phipag commented Jul 8, 2025

Hey @subhash686,

here is my first update. I tested the classesload.txt (automatic priming) implementation that you contributed and found some issues:

  1. This is not running when included in the LambdaMetricsAspect that is annotated with @Aspect. I moved the priming code for loading the classes into the EmfMetricsLogger.java which worked! I verified this by adding a log statement in the beforeCheckpoint and afterRestore methods and verified in a Snapstart deployment that this gets logged.
  2. The classesloaded.txt file does not represent the classes loaded at runtime. Since you generated the file during mvn install -pl powertools-metrics -am this will yield the loaded classes during all maven stages part of the install goal. This does not achieve the intended goal which is to pre-load all classes that are essential at runtime of the application. For example, if you search in the file for "MetricsFactory" nothing will show up but this class is mandatory to be loaded at runtime when using the Metrics module.
    • What we do for GraalVM is documented here. We run the GraalVM tracing agent as part of the unit tests. This way we can actually get the metadata for the loaded classes at runtime (or at least an approximation of it). I envision we can do the same thing here by tracing the loaded classes during execution of the unit tests.
    • If you checkout @msailes example here it will have those classes like MetricsLogger etc.

For issue (2), I think this needs a completely different approach for generating the classesloaded.txt. We need to implement a mechanism to capture the classes at runtime not the classes used by maven during compilation. Possibly, the GraalVM mechanism can be re-used in some form?

I will now review the Idempotency manual priming that you implemented and will get back to you again. Just wanted to share these preliminary findings. Let me know if you have any suggestion / idea about this. I am happy to join a meeting with you to brainstorm more?

replaced build time class list with run time class list
@subhash686
Copy link
Contributor Author

Hi @phipag
Thanks so much for the time you spent adjusting the code and thorough suggestions.

For (2), I created a lambda handler with FlushMetrics annotation and used it in a test to generated the classes again. Used GraalVM profile to run all tests with the -Xlog JVM option. I see the runtime classes now with some extra test classes.
I replaced the classesloaded file for now, so we can continue testing. I am happy to meet to brainstorm the approach to generate the class list.

I will fix (1) once you get a chance to review it all.

@phipag
Copy link
Contributor

phipag commented Jul 9, 2025

Hey @subhash686,

I reviewed and tested the DynamoDB priming (see my comments above). I made these changes locally and ran a load test with your priming code and without the priming code. I observe positive results that confirm priming actually improves the restore duration. Here are my results:

Idempotency Snapstart restore duration

Tested using example at https://github.com/aws-powertools/powertools-lambda-java/blob/984036ca2e3afb92b2c838bcfa2e48a51c3d7a64/examples/powertools-examples-idempotency

Priming achieves ~27% better restore duration.

Without priming

snapstart_restore_count avg_restore_ms min_restore_ms max_restore_ms p50_restore_ms p90_restore_ms p95_restore_ms p99_restore_ms
401 673.2549 389.51 1081.39 675.0206 766.3812 820.28 894.8018

With priming

snapstart_restore_count avg_restore_ms min_restore_ms max_restore_ms p50_restore_ms p90_restore_ms p95_restore_ms p99_restore_ms
402 440.2612 177.07 719.34 438.3243 528.9348 556.596 655.0805

For (2), I created a lambda handler with FlushMetrics annotation and used it in a test to generated the classes again. Used GraalVM profile to run all tests with the -Xlog JVM option. I see the runtime classes now with some extra test classes.
I replaced the classesloaded file for now, so we can continue testing.

Nice, the file looks much better now although it also includes the unit test libraries like org.junit which is expected. We have the same problem in GraalVM and rely on manually cleaning this up. In this case, it might not be a big deal because during the beforeCheckpoint hook, we will simply ignore classes that are not found at runtime.

I am happy to meet to brainstorm the approach to generate the class list.

Happy to hear that. Let's setup a meeting to brainstorm the best approach together for this class loading. Can you send an email to [email protected] and share your timezone and availability (I am based in CET timezone)? I will send out a meeting invite then 🚀

@phipag
Copy link
Contributor

phipag commented Jul 9, 2025

I performed the same load test for the Classloading using the new txt file you provided @subhash686. Here are my results:

Metrics priming (Classloading)

Tested using example at https://github.com/aws-powertools/powertools-lambda-java/blob/984036ca2e3afb92b2c838bcfa2e48a51c3d7a64/examples/powertools-examples-core-utilities/sam (removed tracer and logger).

We can see ~7% better performance using classloading priming.

Without priming

snapstart_restore_count avg_restore_ms min_restore_ms max_restore_ms p50_restore_ms p90_restore_ms p95_restore_ms p99_restore_ms
400 634.9481 443.29 966.02 623.1474 730.4813 778.7357 894.8018

With priming

snapstart_restore_count avg_restore_ms min_restore_ms max_restore_ms p50_restore_ms p90_restore_ms p95_restore_ms p99_restore_ms
405 586.1477 345.14 944.56 581.0391 678.4024 718.1742 831.0078

@phipag
Copy link
Contributor

phipag commented Jul 9, 2025

Next steps:

  • Address issues (1) and (2)
  • Brainstorm in a meeting (@phipag and @subhash686) to understand best process for generating classloaded.txt files
  • Address PR comments (move Idempotency priming logic to DDB datastore and increase priming scope)
  • Merge PR

Copy link
Contributor

@phipag phipag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @subhash686. I added some cosmetic comments in the Priming documentation. I am deploying your changes to a Snapstart Lambda for testing now. I will get back with the results later.

Copy link
Contributor

@phipag phipag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @subhash686, I tested this code again in my AWS account.

✅ Idempotency invoke priming works like a charm
🚧 MetricsFactory needs refinement due to having a private constructor (see my comments)
🚧 ClassPreLoader has a bug. It exits the class pre-loading as soon as one class is not found which we said is expected due to including the test classes as well favoring automation over manual cleanup

Let me know if something is unclear and in case of questions. 🚀 We made a lot of progress here already!

@subhash686
Copy link
Contributor Author

@phipag Thanks for your thorough review as always. I pushed the updates.
I checked the 2 sonar issues and am inclined to accept those. Please let me know if you think otherwise.

@phipag
Copy link
Contributor

phipag commented Jul 30, 2025

@phipag Thanks for your thorough review as always. I pushed the updates. I checked the 2 sonar issues and am inclined to accept those. Please let me know if you think otherwise.

Thanks very much for addressing my comments so quickly @subhash686. I resolved all comments which are fixed now. Please review the open comments.

Regarding Sonarcube, I accepted one issue. Let's address the second one in the next commit. We can extract the class loading into a method like loadClassIfAvailable(className).

I tested the code again after your updates in my AWS account. All Snapstart hooks run as expected now without errors.

@subhash686
Copy link
Contributor Author

@phipag Thanks for your thorough review as always. I pushed the updates. I checked the 2 sonar issues and am inclined to accept those. Please let me know if you think otherwise.

Thanks very much for addressing my comments so quickly @subhash686. I resolved all comments which are fixed now. Please review the open comments.

Regarding Sonarcube, I accepted one issue. Let's address the second one in the next commit. We can extract the class loading into a method like loadClassIfAvailable(className).

I tested the code again after your updates in my AWS account. All Snapstart hooks run as expected now without errors.

Fixed this Sonarqube finding.

@sonarqubecloud
Copy link

Copy link
Contributor

@phipag phipag left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@subhash686, I think this PR is ready to be merged now. I would like to congratulate you on this major contribution. This is laying the foundation for adding different CRaC priming techniques across this project. Also many thanks for documenting this process in the Priming.md doc. 🚀

In the following days, I will create follow-up issues on the items we discussed:

  • Sub-issues for each Powertools utility referencing your Priming documentation
  • An issue for automating the maintenance of the classesloaded.txt file through a GitHub workflow (implementation details to be defined).

@subhash686
Copy link
Contributor Author

@phipag Thank you for you support throughout. I had a great experience working on this feature.
I will continue to look into Github workflow implementation ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Coming soon

Development

Successfully merging this pull request may close these issues.

Add support for CRaC (AWS Snapstart)

2 participants